Tied-Mixture Language Modeling in Continuous Space
نویسندگان
چکیده
This paper presents a new perspective to the language modeling problem by moving the word representations and modeling into the continuous space. In a previous work we introduced Gaussian-Mixture Language Model (GMLM) and presented some initial experiments. Here, we propose Tied-Mixture Language Model (TMLM), which does not have the model parameter estimation problems that GMLM has. TMLM provides a great deal of parameter tying across words, hence achieves robust parameter estimation. As such, TMLM can estimate the probability of any word that has as few as two occurrences in the training data. The speech recognition experiments with the TMLM show improvement over the word trigram model.
منابع مشابه
Segment-Based Acoustic Models for Continuous Speech Recognition
ity or acoustic observations conditioned on the state in Tied-mixture (or semi-continuous) distributions are an imhidden-Markov models (11MM), or for the case of the portant tool for acoustic modeling, used in many highSSM, conditioned on a region of the model. Some of the performance speech recognition systems today. This paper options that have been investigated include discrete dispiovides a...
متن کاملOn The Use Of Tied-Mixture Distributions
Tied-mixture (or semi-continuous) distributions are an important tool for acoustic modeling, used in many highperformance speech recognition systems today. This paper provides a survey of the work in this area, outlining the different options available for tied mixture modeling, introducing algorithms for reducing training time, and providing experimental results assessing the trade-offs for sp...
متن کاملImproved context-dependent acoustic modeling for continuous Chinese speech recognition
This paper describes the new framework of context-dependent (CD) Initial/Final (IF) acoustic modeling using the decision tree based state tying for continuous Chinese speech recognition. The Extended Initial/Final (XIF) set is chosen as the basic speech recognition unit (SRU) set according to the Chinese language characteristics, which outperforms the standard IF set. An adaptive mixture increa...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملPhonetic state tied-mixture tone modeling for large vocabulary continuous Mandarin speech recognition
This paper presents a new approach to tone modeling for continuous Mandarin speech recognition. Mandarin tones provide rich information for speech recognition. In this paper, we treat the tone as an attribute of the final vowel part of a Mandarin syllable. Separate distributions are estimated for cepstral coefficients and pitch features respectively, and the phonetic state tied-mixture techniqu...
متن کامل